Background

Open-Meteo maintains an API for historical weather that allows for non-commercial usage of historical weather data maintained by the website.

This file builds on _v001, _v002, _v003, and _v004 to run exploratory analysis on some historical weather data.

Functions and Libraries

The exploration process uses tidyverse, ranger, several generic custom functions, and several functions specific to Open Meteo processing. First, tidyverse, ranger, and the generic functions are loaded:

library(tidyverse) # tidyverse functionality is included throughout
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ranger) # predict() does not work on ranger objects unless ranger has been called

source("./Generic_Added_Utility_Functions_202105_v001.R") # Basic functions

Next, specific functions written in _v001, _v002, _v003, and _v004 are sourced:

source("./SimpleOneVar_Functions_202411_v001.R") # Functions for basic single variable analysis
source("./OpenMeteo_NextBest_202411_v001.R") # Functions for finding 'next best' predictor given existing model
source("./OpenMeteo_Functions_202411_v001.R") # Core functions for loading, processing, analysis of Open Meteo

Key mapping tables for available metrics are also copied:

hourlyMetrics <- "temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,cloudcover_low,cloudcover_mid,cloudcover_high,shortwave_radiation,direct_radiation,direct_normal_irradiance,diffuse_radiation,windspeed_10m,windspeed_100m,winddirection_10m,winddirection_100m,windgusts_10m,et0_fao_evapotranspiration,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm"
dailyMetrics <- "weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration"

hourlyDescription <- "Air temperature at 2 meters above ground\nRelative humidity at 2 meters above ground\nDew point temperature at 2 meters above ground\nApparent temperature is the perceived feels-like temperature combining wind chill factor, relative humidity and solar radiation\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nTotal precipitation (rain, showers, snow) sum of the preceding hour. Data is stored with a 0.1 mm precision. If precipitation data is summed up to monthly sums, there might be small inconsistencies with the total precipitation amount.\nOnly liquid precipitation of the preceding hour including local showers and rain from large scale systems.\nSnowfall amount of the preceding hour in centimeters. For the water equivalent in millimeter, divide by 7. E.g. 7 cm snow = 10 mm precipitation water equivalent\nTotal cloud cover as an area fraction\nLow level clouds and fog up to 2 km altitude\nMid level clouds from 2 to 6 km altitude\nHigh level clouds from 6 km altitude\nShortwave solar radiation as average of the preceding hour. This is equal to the total global horizontal irradiation\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDiffuse solar radiation as average of the preceding hour\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind direction at 10 or 100 meters above ground\nWind direction at 10 or 100 meters above ground\nGusts at 10 meters above ground of the indicated hour. Wind gusts in CERRA are defined as the maximum wind gusts of the preceding hour. Please consult the ECMWF IFS documentation for more information on how wind gusts are parameterized in weather models.\nET0 Reference Evapotranspiration of a well watered grass field. Based on FAO-56 Penman-Monteith equations ET0 is calculated from temperature, wind speed, humidity and solar radiation. Unlimited soil water is assumed. ET0 is commonly used to estimate the required irrigation for plants.\nWeather condition as a numeric code. Follow WMO weather interpretation codes. See table below for details. Weather code is calculated from cloud cover analysis, precipitation and snowfall. As barely no information about atmospheric stability is available, estimation about thunderstorms is not possible.\nVapor Pressure Deificit (VPD) in kilopascal (kPa). For high VPD (>1.6), water transpiration of plants increases. For low VPD (<0.4), transpiration decreases\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths."
dailyDescription <- "The most severe weather condition on a given day\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily apparent temperature\nMaximum and minimum daily apparent temperature\nSum of daily precipitation (including rain, showers and snowfall)\nSum of daily rain\nSum of daily snowfall\nThe number of hours with rain\nSun rise and set times\nSun rise and set times\nMaximum wind speed and gusts on a day\nMaximum wind speed and gusts on a day\nDominant wind direction\nThe sum of solar radiaion on a given day in Megajoules\nDaily sum of ET0 Reference Evapotranspiration of a well watered grass field"

# Create tibble for hourly metrics
tblMetricsHourly <- tibble::tibble(metric=hourlyMetrics %>% str_split_1(","), 
                                   description=hourlyDescription %>% str_split_1("\n")
                                   )
tblMetricsHourly %>% 
    print(n=50)
## # A tibble: 33 × 2
##    metric                        description                                    
##    <chr>                         <chr>                                          
##  1 temperature_2m                Air temperature at 2 meters above ground       
##  2 relativehumidity_2m           Relative humidity at 2 meters above ground     
##  3 dewpoint_2m                   Dew point temperature at 2 meters above ground 
##  4 apparent_temperature          Apparent temperature is the perceived feels-li…
##  5 pressure_msl                  Atmospheric air pressure reduced to mean sea l…
##  6 surface_pressure              Atmospheric air pressure reduced to mean sea l…
##  7 precipitation                 Total precipitation (rain, showers, snow) sum …
##  8 rain                          Only liquid precipitation of the preceding hou…
##  9 snowfall                      Snowfall amount of the preceding hour in centi…
## 10 cloudcover                    Total cloud cover as an area fraction          
## 11 cloudcover_low                Low level clouds and fog up to 2 km altitude   
## 12 cloudcover_mid                Mid level clouds from 2 to 6 km altitude       
## 13 cloudcover_high               High level clouds from 6 km altitude           
## 14 shortwave_radiation           Shortwave solar radiation as average of the pr…
## 15 direct_radiation              Direct solar radiation as average of the prece…
## 16 direct_normal_irradiance      Direct solar radiation as average of the prece…
## 17 diffuse_radiation             Diffuse solar radiation as average of the prec…
## 18 windspeed_10m                 Wind speed at 10 or 100 meters above ground. W…
## 19 windspeed_100m                Wind speed at 10 or 100 meters above ground. W…
## 20 winddirection_10m             Wind direction at 10 or 100 meters above ground
## 21 winddirection_100m            Wind direction at 10 or 100 meters above ground
## 22 windgusts_10m                 Gusts at 10 meters above ground of the indicat…
## 23 et0_fao_evapotranspiration    ET0 Reference Evapotranspiration of a well wat…
## 24 weathercode                   Weather condition as a numeric code. Follow WM…
## 25 vapor_pressure_deficit        Vapor Pressure Deificit (VPD) in kilopascal (k…
## 26 soil_temperature_0_to_7cm     Average temperature of different soil levels b…
## 27 soil_temperature_7_to_28cm    Average temperature of different soil levels b…
## 28 soil_temperature_28_to_100cm  Average temperature of different soil levels b…
## 29 soil_temperature_100_to_255cm Average temperature of different soil levels b…
## 30 soil_moisture_0_to_7cm        Average soil water content as volumetric mixin…
## 31 soil_moisture_7_to_28cm       Average soil water content as volumetric mixin…
## 32 soil_moisture_28_to_100cm     Average soil water content as volumetric mixin…
## 33 soil_moisture_100_to_255cm    Average soil water content as volumetric mixin…
# Create tibble for daily metrics
tblMetricsDaily <- tibble::tibble(metric=dailyMetrics %>% str_split_1(","), 
                                  description=dailyDescription %>% str_split_1("\n")
                                   )
tblMetricsDaily
## # A tibble: 16 × 2
##    metric                     description                                       
##    <chr>                      <chr>                                             
##  1 weathercode                The most severe weather condition on a given day  
##  2 temperature_2m_max         Maximum and minimum daily air temperature at 2 me…
##  3 temperature_2m_min         Maximum and minimum daily air temperature at 2 me…
##  4 apparent_temperature_max   Maximum and minimum daily apparent temperature    
##  5 apparent_temperature_min   Maximum and minimum daily apparent temperature    
##  6 precipitation_sum          Sum of daily precipitation (including rain, showe…
##  7 rain_sum                   Sum of daily rain                                 
##  8 snowfall_sum               Sum of daily snowfall                             
##  9 precipitation_hours        The number of hours with rain                     
## 10 sunrise                    Sun rise and set times                            
## 11 sunset                     Sun rise and set times                            
## 12 windspeed_10m_max          Maximum wind speed and gusts on a day             
## 13 windgusts_10m_max          Maximum wind speed and gusts on a day             
## 14 winddirection_10m_dominant Dominant wind direction                           
## 15 shortwave_radiation_sum    The sum of solar radiaion on a given day in Megaj…
## 16 et0_fao_evapotranspiration Daily sum of ET0 Reference Evapotranspiration of …

A previously existing dataset is loaded, with key analysis variables defined in a vector:

# Load previous data
allCity <- readFromRDS("allCity_20241116")

# Get core training variables
varsTrain <- getVarsTrain(allCity)
varsTrain
##  [1] "hour"                          "temperature_2m"               
##  [3] "relativehumidity_2m"           "dewpoint_2m"                  
##  [5] "apparent_temperature"          "pressure_msl"                 
##  [7] "surface_pressure"              "precipitation"                
##  [9] "rain"                          "snowfall"                     
## [11] "cloudcover"                    "cloudcover_low"               
## [13] "cloudcover_mid"                "cloudcover_high"              
## [15] "shortwave_radiation"           "direct_radiation"             
## [17] "direct_normal_irradiance"      "diffuse_radiation"            
## [19] "windspeed_10m"                 "windspeed_100m"               
## [21] "winddirection_10m"             "winddirection_100m"           
## [23] "windgusts_10m"                 "et0_fao_evapotranspiration"   
## [25] "weathercode"                   "vapor_pressure_deficit"       
## [27] "soil_temperature_0_to_7cm"     "soil_temperature_7_to_28cm"   
## [29] "soil_temperature_28_to_100cm"  "soil_temperature_100_to_255cm"
## [31] "soil_moisture_0_to_7cm"        "soil_moisture_7_to_28cm"      
## [33] "soil_moisture_28_to_100cm"     "soil_moisture_100_to_255cm"   
## [35] "year"                          "doy"
# Assign default label
keyLabel <- genericKeyLabelOM()
keyLabel
## [1] "predictions based on pre-2022 training data applied to 2022 holdout dataset"

The correlation heatmap is reproduced, with functions that borrowing from the recipe at STHDA:

# Default function
corVarsTrain <- makeHeatMap(allCity, vecSelect=varsTrain, returnData=TRUE)

corVarsTrain %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
##    Var1                          Var2                         value
##    <fct>                         <fct>                        <dbl>
##  1 rain                          precipitation                0.989
##  2 apparent_temperature          temperature_2m               0.984
##  3 direct_radiation              shortwave_radiation          0.974
##  4 temperature_2m                soil_temperature_0_to_7cm    0.962
##  5 soil_temperature_28_to_100cm  soil_temperature_7_to_28cm   0.952
##  6 shortwave_radiation           et0_fao_evapotranspiration   0.944
##  7 apparent_temperature          soil_temperature_0_to_7cm    0.942
##  8 soil_moisture_7_to_28cm       soil_moisture_0_to_7cm       0.940
##  9 windspeed_100m                windspeed_10m                0.940
## 10 direct_radiation              et0_fao_evapotranspiration   0.928
## 11 direct_radiation              direct_normal_irradiance     0.922
## 12 soil_moisture_28_to_100cm     soil_moisture_7_to_28cm      0.922
## 13 soil_temperature_0_to_7cm     soil_temperature_7_to_28cm   0.919
## 14 apparent_temperature          soil_temperature_7_to_28cm   0.918
## 15 temperature_2m                soil_temperature_7_to_28cm   0.917
## 16 shortwave_radiation           direct_normal_irradiance     0.901
## 17 windspeed_10m                 windgusts_10m                0.884
## 18 soil_temperature_100_to_255cm soil_temperature_28_to_100cm 0.863
## 19 soil_moisture_100_to_255cm    soil_moisture_28_to_100cm    0.859
## 20 apparent_temperature          soil_temperature_28_to_100cm 0.853
## # ℹ 610 more rows

The correlation heatmap is produced for a single city:

corVarsTrainBOS <- makeHeatMap(allCity %>% filter(src=="Boston"), vecSelect=varsTrain, returnData=TRUE)

corVarsTrainBOS %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
##    Var1                         Var2                         value
##    <fct>                        <fct>                        <dbl>
##  1 surface_pressure             pressure_msl                 1.00 
##  2 apparent_temperature         temperature_2m               0.993
##  3 temperature_2m               soil_temperature_0_to_7cm    0.965
##  4 apparent_temperature         soil_temperature_0_to_7cm    0.965
##  5 soil_temperature_0_to_7cm    soil_temperature_7_to_28cm   0.965
##  6 direct_radiation             shortwave_radiation          0.962
##  7 rain                         precipitation                0.959
##  8 soil_temperature_28_to_100cm soil_temperature_7_to_28cm   0.950
##  9 windspeed_100m               windspeed_10m                0.946
## 10 apparent_temperature         dewpoint_2m                  0.937
## 11 shortwave_radiation          et0_fao_evapotranspiration   0.935
## 12 windspeed_10m                windgusts_10m                0.927
## 13 apparent_temperature         soil_temperature_7_to_28cm   0.925
## 14 temperature_2m               dewpoint_2m                  0.917
## 15 direct_radiation             direct_normal_irradiance     0.916
## 16 soil_moisture_7_to_28cm      soil_moisture_0_to_7cm       0.915
## 17 temperature_2m               soil_temperature_7_to_28cm   0.914
## 18 direct_radiation             et0_fao_evapotranspiration   0.902
## 19 winddirection_100m           winddirection_10m            0.895
## 20 soil_temperature_0_to_7cm    soil_temperature_28_to_100cm 0.886
## # ℹ 610 more rows

Some variables, such as surface pressure vs. MSL pressure, are much differently correlated if controlling for city. Differences are explored:

tstCorDelta <- corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
           ) %>% 
    select(v1, v2, value_all=value) %>%
    full_join(corVarsTrainBOS %>% 
                  mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
                         v2=pmax(as.character(Var1), as.character(Var2))
                         ) %>% 
                  select(v1, v2, value_bos=value), 
              by=c("v1", "v2")
              ) %>%
    mutate(delta=value_bos-value_all)

tstCorDelta
## # A tibble: 666 × 5
##    v1                   v2                            value_all value_bos  delta
##    <chr>                <chr>                             <dbl>     <dbl>  <dbl>
##  1 dewpoint_2m          dewpoint_2m                     1          1      0     
##  2 dewpoint_2m          soil_temperature_7_to_28cm      0.675      0.874  0.199 
##  3 dewpoint_2m          soil_temperature_28_to_100cm    0.630      0.807  0.177 
##  4 dewpoint_2m          soil_temperature_0_to_7cm       0.623      0.875  0.252 
##  5 dewpoint_2m          temperature_2m                  0.691      0.917  0.226 
##  6 apparent_temperature dewpoint_2m                     0.785      0.937  0.152 
##  7 dewpoint_2m          soil_temperature_100_to_255cm   0.448      0.505  0.0567
##  8 dewpoint_2m          doy                             0.185      0.312  0.127 
##  9 dewpoint_2m          hour                            0.00764    0.0203 0.0127
## 10 dewpoint_2m          vapor_pressure_deficit         -0.0678     0.305  0.373 
## # ℹ 656 more rows
tstCorDelta %>% arrange(delta) %>% head(20)
## # A tibble: 20 × 5
##    v1                         v2                      value_all value_bos  delta
##    <chr>                      <chr>                       <dbl>     <dbl>  <dbl>
##  1 relativehumidity_2m        surface_pressure           0.578   -0.160   -0.738
##  2 soil_moisture_0_to_7cm     surface_pressure           0.594   -0.0914  -0.685
##  3 soil_moisture_7_to_28cm    surface_pressure           0.586   -0.0240  -0.610
##  4 dewpoint_2m                surface_pressure           0.364   -0.237   -0.601
##  5 soil_moisture_100_to_255cm surface_pressure           0.602    0.00609 -0.596
##  6 soil_moisture_28_to_100cm  surface_pressure           0.588    0.0215  -0.566
##  7 surface_pressure           windspeed_100m             0.247   -0.311   -0.558
##  8 dewpoint_2m                soil_moisture_28_to_10…    0.0416  -0.508   -0.549
##  9 surface_pressure           windspeed_10m              0.195   -0.304   -0.498
## 10 dewpoint_2m                soil_moisture_7_to_28cm    0.0435  -0.448   -0.491
## 11 surface_pressure           windgusts_10m              0.133   -0.332   -0.465
## 12 dewpoint_2m                soil_moisture_0_to_7cm     0.0722  -0.393   -0.465
## 13 relativehumidity_2m        soil_moisture_28_to_10…    0.401   -0.0532  -0.454
## 14 relativehumidity_2m        soil_moisture_100_to_2…    0.374   -0.0545  -0.429
## 15 cloudcover                 surface_pressure           0.279   -0.144   -0.423
## 16 relativehumidity_2m        soil_moisture_7_to_28cm    0.441    0.0357  -0.406
## 17 soil_moisture_0_to_7cm     soil_moisture_100_to_2…    0.796    0.402   -0.393
## 18 cloudcover_low             surface_pressure           0.197   -0.186   -0.383
## 19 surface_pressure           weathercode                0.140   -0.240   -0.380
## 20 cloudcover_mid             surface_pressure           0.142   -0.221   -0.362
tstCorDelta %>% arrange(desc(delta)) %>% head(20)
## # A tibble: 20 × 5
##    v1                         v2                       value_all value_bos delta
##    <chr>                      <chr>                        <dbl>     <dbl> <dbl>
##  1 soil_moisture_100_to_255cm year                        0.0387   0.739   0.701
##  2 pressure_msl               surface_pressure            0.401    1.00    0.599
##  3 soil_moisture_28_to_100cm  year                       -0.0677   0.386   0.454
##  4 soil_moisture_7_to_28cm    year                       -0.0426   0.406   0.449
##  5 surface_pressure           vapor_pressure_deficit     -0.493   -0.0499  0.443
##  6 dewpoint_2m                vapor_pressure_deficit     -0.0678   0.305   0.373
##  7 soil_moisture_100_to_255cm vapor_pressure_deficit     -0.337   -0.00259 0.334
##  8 soil_moisture_0_to_7cm     year                        0.0110   0.334   0.323
##  9 doy                        soil_temperature_100_to…    0.453    0.764   0.311
## 10 relativehumidity_2m        soil_temperature_0_to_7…   -0.253    0.00738 0.260
## 11 dewpoint_2m                soil_temperature_0_to_7…    0.623    0.875   0.252
## 12 relativehumidity_2m        soil_temperature_28_to_…   -0.133    0.116   0.249
## 13 relativehumidity_2m        temperature_2m             -0.213    0.0356  0.249
## 14 relativehumidity_2m        soil_temperature_7_to_2…   -0.139    0.109   0.247
## 15 relativehumidity_2m        soil_temperature_100_to…   -0.120    0.123   0.243
## 16 pressure_msl               vapor_pressure_deficit     -0.292   -0.0521  0.240
## 17 dewpoint_2m                temperature_2m              0.691    0.917   0.226
## 18 cloudcover                 temperature_2m             -0.162    0.0534  0.215
## 19 direct_normal_irradiance   surface_pressure           -0.118    0.0838  0.201
## 20 dewpoint_2m                soil_temperature_7_to_2…    0.675    0.874   0.199

The process is repeated, using an aggregation of correlations in each of the seven cities:

keyCities <- allCity %>% pull(src) %>% unique()
keyCities
## [1] "NYC"     "LA"      "Chicago" "Houston" "Vegas"   "Miami"   "Boston"
corVarsTrainEach <- keyCities %>% 
    map_dfr(.f=function(x) makeHeatMap(allCity %>% filter(src==x), 
                                       vecSelect=varsTrain[!varsTrain %in% ifelse(x=="Miami", "snowfall", "")],
                                       returnData=TRUE, 
                                       plotMap=FALSE
                                       ) %>% 
                mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
                       v2=pmax(as.character(Var1), as.character(Var2))
                       ) %>% 
                select(v1, v2, value), .id="src") %>%
    mutate(city=keyCities[as.integer(src)]) %>%
    select(src, city, everything())

corVarsTrainEach
## # A tibble: 4,626 × 5
##    src   city  v1                            v2                           value
##    <chr> <chr> <chr>                         <chr>                        <dbl>
##  1 1     NYC   soil_temperature_7_to_28cm    soil_temperature_7_to_28cm  1     
##  2 1     NYC   soil_temperature_28_to_100cm  soil_temperature_7_to_28cm  0.952 
##  3 1     NYC   dewpoint_2m                   soil_temperature_7_to_28cm  0.883 
##  4 1     NYC   soil_temperature_0_to_7cm     soil_temperature_7_to_28cm  0.954 
##  5 1     NYC   soil_temperature_7_to_28cm    temperature_2m              0.932 
##  6 1     NYC   apparent_temperature          soil_temperature_7_to_28cm  0.937 
##  7 1     NYC   soil_temperature_100_to_255cm soil_temperature_7_to_28cm  0.584 
##  8 1     NYC   doy                           soil_temperature_7_to_28cm  0.360 
##  9 1     NYC   cloudcover_mid                soil_temperature_7_to_28cm -0.100 
## 10 1     NYC   cloudcover_high               soil_temperature_7_to_28cm  0.0134
## # ℹ 4,616 more rows
corVarsTrainEach %>% 
    arrange(desc(value)) %>% 
    filter(v1!=v2) %>% 
    slice(1:15, (nrow(.)-14):nrow(.)) %>%
    print(n=30)
## # A tibble: 30 × 5
##    src   city    v1                         v2                             value
##    <chr> <chr>   <chr>                      <chr>                          <dbl>
##  1 6     Miami   precipitation              rain                           1    
##  2 6     Miami   pressure_msl               surface_pressure               1.00 
##  3 7     Boston  pressure_msl               surface_pressure               1.00 
##  4 2     LA      precipitation              rain                           1.00 
##  5 4     Houston pressure_msl               surface_pressure               1.00 
##  6 4     Houston precipitation              rain                           1.00 
##  7 1     NYC     pressure_msl               surface_pressure               1.00 
##  8 5     Vegas   precipitation              rain                           0.999
##  9 3     Chicago pressure_msl               surface_pressure               0.994
## 10 1     NYC     apparent_temperature       temperature_2m                 0.993
## 11 3     Chicago apparent_temperature       temperature_2m                 0.993
## 12 7     Boston  apparent_temperature       temperature_2m                 0.993
## 13 5     Vegas   direct_radiation           shortwave_radiation            0.989
## 14 5     Vegas   apparent_temperature       temperature_2m                 0.988
## 15 3     Chicago precipitation              rain                           0.988
## 16 1     NYC     soil_moisture_0_to_7cm     soil_temperature_7_to_28cm    -0.680
## 17 1     NYC     soil_moisture_7_to_28cm    soil_temperature_0_to_7cm     -0.681
## 18 1     NYC     soil_moisture_100_to_255cm soil_temperature_100_to_255cm -0.687
## 19 1     NYC     soil_moisture_28_to_100cm  soil_temperature_100_to_255cm -0.688
## 20 4     Houston et0_fao_evapotranspiration relativehumidity_2m           -0.689
## 21 1     NYC     soil_moisture_28_to_100cm  soil_temperature_7_to_28cm    -0.707
## 22 1     NYC     soil_moisture_7_to_28cm    soil_temperature_28_to_100cm  -0.715
## 23 1     NYC     soil_moisture_7_to_28cm    soil_temperature_7_to_28cm    -0.720
## 24 3     Chicago soil_moisture_100_to_255cm year                          -0.758
## 25 1     NYC     soil_moisture_28_to_100cm  soil_temperature_28_to_100cm  -0.777
## 26 4     Houston relativehumidity_2m        vapor_pressure_deficit        -0.791
## 27 2     LA      relativehumidity_2m        vapor_pressure_deficit        -0.818
## 28 5     Vegas   soil_moisture_28_to_100cm  year                          -0.828
## 29 5     Vegas   soil_moisture_7_to_28cm    year                          -0.831
## 30 6     Miami   relativehumidity_2m        vapor_pressure_deficit        -0.868

Comparison of aggregated individual city correlations to the overall correlations:

corVarsTrainDelta <- corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
    ) %>% 
    select(v1, v2, value_all=value) %>% 
    full_join(corVarsTrainEach %>% 
                  group_by(v1, v2) %>% 
                  summarize(value=mean(value), .groups="drop"), 
              by=c("v1", "v2")) %>% 
    mutate(delta=value-value_all) 

corVarsTrainDelta %>% 
    arrange(delta) %>% 
    slice(1:15, (nrow(.)-14):nrow(.)) %>%
    print(n=30)
## # A tibble: 30 × 5
##    v1                         v2                       value_all    value  delta
##    <chr>                      <chr>                        <dbl>    <dbl>  <dbl>
##  1 relativehumidity_2m        surface_pressure            0.578  -0.178   -0.756
##  2 dewpoint_2m                surface_pressure            0.364  -0.371   -0.735
##  3 soil_moisture_100_to_255cm soil_moisture_7_to_28cm     0.824   0.156   -0.668
##  4 soil_moisture_0_to_7cm     soil_moisture_100_to_25…    0.796   0.138   -0.658
##  5 soil_moisture_0_to_7cm     surface_pressure            0.594  -0.0212  -0.615
##  6 soil_moisture_100_to_255cm surface_pressure            0.602  -0.00359 -0.606
##  7 soil_moisture_100_to_255cm soil_moisture_28_to_100…    0.859   0.283   -0.576
##  8 soil_moisture_7_to_28cm    surface_pressure            0.586   0.0143  -0.572
##  9 soil_moisture_28_to_100cm  surface_pressure            0.588   0.0277  -0.560
## 10 cloudcover                 surface_pressure            0.279  -0.138   -0.417
## 11 surface_pressure           windspeed_100m              0.247  -0.165   -0.412
## 12 relativehumidity_2m        soil_moisture_100_to_25…    0.374  -0.00330 -0.378
## 13 soil_moisture_0_to_7cm     soil_moisture_28_to_100…    0.842   0.491   -0.351
## 14 surface_pressure           windspeed_10m               0.195  -0.149   -0.344
## 15 relativehumidity_2m        soil_moisture_28_to_100…    0.401   0.0674  -0.333
## 16 vapor_pressure_deficit     windspeed_10m              -0.0519  0.100    0.152
## 17 relativehumidity_2m        soil_temperature_100_to…   -0.120   0.0325   0.153
## 18 et0_fao_evapotranspiration soil_moisture_100_to_25…   -0.138   0.0221   0.160
## 19 diffuse_radiation          vapor_pressure_deficit      0.283   0.449    0.166
## 20 soil_moisture_28_to_100cm  vapor_pressure_deficit     -0.404  -0.234    0.170
## 21 soil_moisture_100_to_255cm temperature_2m             -0.204  -0.0213   0.183
## 22 soil_moisture_100_to_255cm soil_temperature_0_to_7…   -0.221  -0.0296   0.192
## 23 soil_moisture_100_to_255cm soil_temperature_28_to_…   -0.308  -0.115    0.193
## 24 dewpoint_2m                vapor_pressure_deficit     -0.0678  0.136    0.203
## 25 soil_moisture_100_to_255cm soil_temperature_7_to_2…   -0.257  -0.0527   0.205
## 26 direct_normal_irradiance   surface_pressure           -0.118   0.0875   0.205
## 27 doy                        soil_temperature_100_to…    0.453   0.739    0.286
## 28 soil_moisture_100_to_255cm vapor_pressure_deficit     -0.337  -0.0176   0.319
## 29 surface_pressure           vapor_pressure_deficit     -0.493  -0.0665   0.427
## 30 pressure_msl               surface_pressure            0.401   0.983    0.582

Differences by variable are explored:

corVarsTrain %>% 
    mutate(v1=pmin(as.character(Var1), as.character(Var2)), 
           v2=pmax(as.character(Var1), as.character(Var2))
    ) %>% 
    select(v1, v2, value_all=value) %>% 
    full_join(corVarsTrainEach, by=c("v1", "v2")) %>% 
    mutate(delta=value-value_all) %>%
    filter(v1!=v2) %>%
    select(city, v1, v2, delta) %>%
    bind_rows(., ., .id="src") %>%
    mutate(vrbl=case_when(src=="1"~v1, src=="2"~v2, TRUE~NA)) %>%
    ggplot(aes(x=fct_reorder(vrbl, delta, .fun=function(x) diff(range(x))), y=delta)) + 
    geom_boxplot(fill="lightblue") + 
    coord_flip() + 
    geom_hline(yintercept=0, color="black", lty=2) +
    labs(title="Correlation delta (individual minus all-city aggregate)", 
         x=NULL, 
         y="Delta (individual minus all-city aggregate)", 
         subtitle="Calculated for each variable combination and city"
         )

Soil moisture, surface pressure, and dewpoint are among the variables with the highest changes in correlation when calculated n individual cities and the all-city aggregate

An example Simpson’s paradox is MSL pressure vs. surface pressure:

allCity %>% 
    count(src, surface_pressure, pressure_msl) %>% 
    ggplot(aes(x=surface_pressure, pressure_msl)) + 
    geom_smooth(aes(weight=n, color=src), method="lm") + 
    geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
    labs(title="Relationship between MSL pressure and surface pressure", 
         subtitle="Dashed black line is overall relationship"
         ) + 
    scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The process is converted to functional form:

tmpSmoothPlot <- function(df, x, y, xName=x, yName=y, printPlot=TRUE, returnPlot=!isTRUE(printPlot)) {
    
    # FUNCTION ARGUMENTS:
    # df: the data frame
    # x: x variable
    # y: y variable
    # xName: name to describe x variable
    # yName: name to describe y variable
    # printPlot: boolean, should plot be printed?
    # returnPlot: boolean, should plot object be returned?
    
    p1 <- df %>% 
        select(src, all_of(c(x, y))) %>%
        purrr::set_names(c("src", "x1", "y1")) %>%
        count(src, x1, y1) %>% 
        ggplot(aes(x=x1, y=y1)) + 
        geom_smooth(aes(weight=n, color=src), method="lm") + 
        geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
        labs(title=paste0("Relationship between ", xName, " and ", yName), 
             subtitle="Dashed black line is overall relationship", 
             y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
             x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
             ) + 
    scale_color_discrete(NULL)
    
    # Print plot if requested
    if(isTRUE(printPlot)) print(p1)
    
    # Return plot if requested
    if(isTRUE(returnPlot)) return(p1)
    
}

# Example function call
tmpSmoothPlot(allCity, 
              x="surface_pressure", 
              y="pressure_msl", 
              yName="MSL Pressure", 
              xName="Surface Pressure"
              )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The function is tested on two additional sets of metrics:

gridExtra::grid.arrange(tmpSmoothPlot(allCity %>% 
                                          mutate(across(c("apparent_temperature", "dewpoint_2m"), 
                                                        .fns=function(x) round(x)
                                                        )
                                                 ), 
                                      x="dewpoint_2m", 
                                      y="apparent_temperature", 
                                      xName="Dew Point", 
                                      yName="Apparent Temperature", 
                                      printPlot=FALSE
                                      ), 
                        tmpSmoothPlot(allCity %>% 
                                          mutate(across(c("surface_pressure", "dewpoint_2m"), 
                                                        .fns=function(x) round(x)
                                                        )
                                                 ), 
                                      x="dewpoint_2m", 
                                      y="surface_pressure", 
                                      xName="Dew Point", 
                                      yName="Surface Pressure", 
                                      printPlot=FALSE
                                      ), 
                        nrow=1
                        )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

The function is updated to allow for rounding:

tmpSmoothPlot <- function(df, 
                          x, 
                          y, 
                          xRound=NULL, 
                          yRound=NULL,
                          xName=x, 
                          yName=y, 
                          printPlot=TRUE, 
                          returnPlot=!isTRUE(printPlot)
                          ) {
    
    # FUNCTION ARGUMENTS:
    # df: the data frame
    # x: x variable
    # y: y variable
    # {x,y}Round: rounding to apply to vector {x,y} using function autoRound()
    #             NULL means no rounding (default)
    #             -1L means make an estimate based on data (around 100 buckets created)
    #             a positive float or integer means round everything to the nearest multiple
    # xName: name to describe x variable
    # yName: name to describe y variable
    # printPlot: boolean, should plot be printed?
    # returnPlot: boolean, should plot object be returned?
    
    p1 <- df %>% 
        select(src, all_of(c(x, y))) %>%
        purrr::set_names(c("src", "x1", "y1")) %>%
        mutate(x1=autoRound(x1, rndTo=xRound), 
               y1=autoRound(y1, rndTo=yRound)
               ) %>%
        count(src, x1, y1) %>% 
        ggplot(aes(x=x1, y=y1)) + 
        geom_smooth(aes(weight=n, color=src), method="lm") + 
        geom_smooth(method="lm", lty=2, aes(weight=n), color="black") + 
        labs(title=paste0("Relationship between ", xName, " and ", yName), 
             subtitle="Dashed black line is overall relationship", 
             y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
             x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
             ) + 
    scale_color_discrete(NULL)
    
    # Print plot if requested
    if(isTRUE(printPlot)) print(p1)
    
    # Return plot if requested
    if(isTRUE(returnPlot)) return(p1)
    
}

The updated function is tested on elements with many buckets:

# Example function call (no rounding)
t0 <- Sys.time()
system.time(
    tmpSmoothPlot(allCity, 
                  x="direct_normal_irradiance", 
                  y="shortwave_radiation", 
                  xName="Direct Solar Radiation", 
                  yName="Shortwave Solar Radiation"
                  )
    )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

##    user  system elapsed 
##    3.69    0.38    4.12
Sys.time() - t0
## Time difference of 4.284883 secs
# Example function call (rounding)
t0 <- Sys.time()
system.time(
    tmpSmoothPlot(allCity, 
                  x="direct_normal_irradiance", 
                  y="shortwave_radiation", 
                  xRound=-1L, 
                  yRound=-1L,
                  xName="Direct Solar Radiation", 
                  yName="Shortwave Solar Radiation"
                  )
    )
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

##    user  system elapsed 
##    0.69    0.09    0.80
Sys.time() - t0
## Time difference of 0.9890871 secs

A simple model is run to predict surface pressure as a function of dewpoint, without considering location:

# Create model
tstLM <- allCity %>%
    select(dewpoint_2m, surface_pressure) %>%
    lm(surface_pressure ~ dewpoint_2m, data=.)

# Summary
summary(tstLM)
## 
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.042 -15.314   5.729  17.119  83.092 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.871e+02  3.631e-02 27182.2   <2e-16 ***
## dewpoint_2m 9.507e-01  2.632e-03   361.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.72 on 854206 degrees of freedom
## Multiple R-squared:  0.1325, Adjusted R-squared:  0.1325 
## F-statistic: 1.305e+05 on 1 and 854206 DF,  p-value: < 2.2e-16
# Review of predictions
allCity %>%
    select(src, dewpoint_2m, surface_pressure) %>%
    mutate(pred=predict(tstLM, newdata=.)) %>%
    mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
    count(surface_pressure, pred) %>%
    ggplot(aes(x=pred, y=surface_pressure)) + 
    geom_point(aes(size=n), alpha=0.25) + 
    geom_smooth(aes(weight=n), method="lm") + 
    geom_abline(slope=1, intercept=0, lty=2, color="red") + 
    labs(x="Prediction", 
         y="Actual", 
         title="Predictions for Surface Pressure", 
         subtitle="Surface Pressure ~ Dewpoint"
         ) + 
    scale_size_continuous(NULL)
## `geom_smooth()` using formula = 'y ~ x'

The model is updated to predict surface pressure as a function of dewpoint, considering location:

# Create model
tstLM_002 <- allCity %>%
    select(dewpoint_2m, surface_pressure, src) %>%
    lm(surface_pressure ~ dewpoint_2m:src + src, data=.)

# Summary
summary(tstLM_002)
## 
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m:src + src, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.856  -2.901   0.233   3.277  28.618 
## 
## Coefficients:
##                          Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)             1.017e+03  1.844e-02 55129.10   <2e-16 ***
## srcChicago             -2.021e+01  2.594e-02  -779.18   <2e-16 ***
## srcHouston              4.918e+00  4.032e-02   121.98   <2e-16 ***
## srcLA                  -3.904e+01  2.910e-02 -1341.57   <2e-16 ***
## srcMiami                5.985e+00  7.381e-02    81.09   <2e-16 ***
## srcNYC                 -2.917e+00  2.700e-02  -108.04   <2e-16 ***
## srcVegas               -8.141e+01  2.487e-02 -3273.38   <2e-16 ***
## dewpoint_2m:srcBoston  -1.907e-01  1.561e-03  -122.19   <2e-16 ***
## dewpoint_2m:srcChicago -2.323e-01  1.493e-03  -155.62   <2e-16 ***
## dewpoint_2m:srcHouston -4.411e-01  2.017e-03  -218.66   <2e-16 ***
## dewpoint_2m:srcLA      -2.287e-01  2.261e-03  -101.17   <2e-16 ***
## dewpoint_2m:srcMiami   -3.044e-01  3.512e-03   -86.66   <2e-16 ***
## dewpoint_2m:srcNYC     -1.990e-01  1.607e-03  -123.81   <2e-16 ***
## dewpoint_2m:srcVegas   -1.471e-01  2.120e-03   -69.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.725 on 854194 degrees of freedom
## Multiple R-squared:  0.9602, Adjusted R-squared:  0.9602 
## F-statistic: 1.585e+06 on 13 and 854194 DF,  p-value: < 2.2e-16
# Review of predictions
allCity %>%
    select(src, dewpoint_2m, surface_pressure) %>%
    mutate(pred=predict(tstLM_002, newdata=.)) %>%
    mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
    count(src, surface_pressure, pred) %>%
    ggplot(aes(x=pred, y=surface_pressure)) + 
    geom_point(aes(size=n, color=src), alpha=0.25) + 
    geom_smooth(aes(weight=n), method="lm") + 
    geom_abline(slope=1, intercept=0, lty=2, color="red") + 
    labs(x="Prediction", 
         y="Actual", 
         title="Predictions for Surface Pressure", 
         subtitle="Surface Pressure ~ Dewpoint:City + City"
         ) + 
    scale_size_continuous(NULL) + 
    scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'

Long term daily data is downloaded for Atlanta:

# Daily data download for Atlanta, GA
testURLDaily <- helperOpenMeteoURL(cityName="Atlanta GA", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Eastern"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.76&longitude=-84.42&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_atl.json")) {
    fileDownload(fileName="testOM_daily_atl.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_atl.json already exists, skipping download\n")
}
## 
## File testOM_daily_atl.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
atlOMDaily <- formatOpenMeteoJSON("testOM_daily_atl.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          71                2.6                0.1
##  2 1960-01-02 1960-01-02          61               11.4                1.5
##  3 1960-01-03 1960-01-03          63               14.7                2  
##  4 1960-01-04 1960-01-04          53                7.3               -0.1
##  5 1960-01-05 1960-01-05          51                7.1                0.2
##  6 1960-01-06 1960-01-06          53                7.3                4.5
##  7 1960-01-07 1960-01-07          61                7.4                3.1
##  8 1960-01-08 1960-01-08           2               10.7                0.4
##  9 1960-01-09 1960-01-09           3               15                 -0.8
## 10 1960-01-10 1960-01-10           3               17.5                4  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     33.8     -84.4              388.             -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 33.77856
## longitude: -84.40299
## generationtime_ms: 388.082
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: EST
## elevation: 302
# Sample records of tibble
atlOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max         <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min         <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max   <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min   <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum          <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum                   <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise                    <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset                     <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ windspeed_10m_max          <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max          <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum    <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…

Variables are converted to proper data type:

dfDailyATL <- atlOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyATL)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max         <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min         <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max   <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min   <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum          <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum                   <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum               <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise                    <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset                     <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max          <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max          <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum    <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir                <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Atlanta 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Atlanta 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Atlanta 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Atlanta 1960-2023)"
         )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Atlanta 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Atlanta 1960-2023)"
         )

dfDailyATL %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Atlanta 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyATL$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 365 737
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 549 916
pacfTemp <- pacf(dfDailyATL$temperature_2m_max, lag.max=1000)

pacf(dfDailyATL$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyATL$precipitation_sum, lag.max=1000)

acf(dfDailyATL$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  28  52  78 100 126 140 157 187 202 213 250 284 311 335 361 377 406 428 442
## [20] 465 481 502 523 540 553 585 609 636 650 680 692 710 728 744 757 769 791 803
## [39] 825 852 870 897 913 930 941 961 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  18  56  71  97 120 134 166 178 195 210 226 238 268 288 302 320 332 344 359
## [20] 383 409 420 438 452 471 483 497 533 551 565 581 598 613 633 654 676 695 719
## [39] 743 766 799 812 842 862 902 920 948 977
pacfPrecip <- pacf(dfDailyATL$precipitation_sum, lag.max=1000)

pacf(dfDailyATL$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value

ACF and PACF are explored for windspeed:

acfWind <- acf(dfDailyATL$windspeed_10m_max, lag.max=1000)

acf(dfDailyATL$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  13 172 183 338 361 378 553 738 894 921
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 174 185 555 730 891 910 944
pacfWind <- pacf(dfDailyATL$windspeed_10m_max, lag.max=1000)

pacf(dfDailyATL$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.75 for temperature)

A boxplot for delta daily temperature is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally larger in magnitude during the colder season

A boxplot for delta daily precipitation is created:

dfDailyATL %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyATL %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyATL %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 25+ mm (about an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 202s, driving overall precipitation totals ~50% higher

Precipitation appears to have broken trend starting in the late 2010s:

dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>% 
    group_by(decade, year) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    ggplot(aes(x=year, y=precip)) + 
    geom_line(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), lwd=1) + 
    geom_smooth(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), method="lm", lty=2) + 
    geom_point() + 
    labs(x=NULL, y="Annual precipitation (mm)") + 
    scale_color_discrete(NULL) + 
    lims(y=c(0, NA))
## `geom_smooth()` using formula = 'y ~ x'

Precipitation by month pre-2017 is explored:

yMax <- dfDailyATL %>%
    group_by(year=year(date), month=month(date)) %>%
    summarize(precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(precip=ceiling(precip/50)*50) %>%
    pull() %>%
    max()
yMax
## [1] 350
p1 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", year)
           ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_col(data=~filter(., tp=="pre"), aes(y=mu), fill="lightblue") + 
    geom_errorbar(data=~filter(., tp=="pre"), aes(ymin=mu-sd, ymax=mu+sd), width=0.2) +
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation +/- 1 SD\n(pre-2017)") + 
    lims(y=c(0, yMax))
p1

Precipitation by month post-2017 is explored:

p2 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", year)
           ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    filter(tp!="pre") %>%
    group_by(month) %>% 
    summarize(mu=mean(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_col(aes(y=mu), fill="lightblue") + 
    geom_errorbar(aes(ymin=mn, ymax=mx), width=0.2) +
    geom_hline(color="red", lty=2, aes(yintercept=mean(mu))) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\nmax-mean-min (post-2017)") + 
    lims(y=c(0, yMax))
p2

The data are plotted on a single plot:

p3 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") + 
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
    geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
    annotate("text", x=1, y=25, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) + 
    annotate("text", x=1, y=300, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) + 
    labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\n(pre/post-2017)") + 
    lims(y=c(0, yMax))
p3

The average number of days with precipitation by month is explored:

yMax <- dfDailyATL %>%
    group_by(year=year(date), month=month(date)) %>%
    summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
    mutate(precip=ceiling(precip/5)*5) %>%
    pull() %>%
    max()
yMax
## [1] 35
p4 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year, month) %>% 
    summarize(precip=sum(precipitation_sum>0), .groups="drop") %>% 
    group_by(tp, month) %>% 
    summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>% 
    ggplot(aes(x=month)) + 
    geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") + 
    geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) + 
    geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
    geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
    annotate("text", x=1, y=5, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) + 
    annotate("text", x=1, y=30, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) + 
    labs(x=NULL, 
         y="Monthly precipitation (days greater than zero)", 
         title="Monthly days with precipitation\n(pre/post-2017)") + 
    lims(y=c(0, yMax))
p4

Median precipitation on days with precipitation > 0, by month, is explored:

p5 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, month) %>% 
    summarize(nprecip=sum(precipitation_sum>0), 
              tprecip=sum(precipitation_sum),
              mdnprecip=median(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
              muprecip=mean(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
              .groups="drop"
              ) %>% 
    select(tp, month, mdnprecip, muprecip) %>%
    pivot_longer(-c(tp, month)) %>%
    ggplot(aes(x=month)) + 
    geom_line(aes(y=value, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
    facet_wrap(~c("mdnprecip"="Median (mm) on days with precipitation", 
                  "muprecip"="Mean (mm) on days with precipitation"
                  )[name]
               ) +
    labs(x=NULL, 
         title="Monthly precipitation on days with precipitation greater than zero\n(pre/post-2017)", 
         y="Precipitation (mm)"
         ) + 
    ylim(c(0, NA)) + 
    scale_color_discrete(NULL)
p5

The heaviest 3-day precipitation by year is explored:

p6 <- dfDailyATL %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), 
           decade=round(year(date)-4.5, -1), 
           month=factor(month.abb[month(date)], levels=month.abb), 
           tp=ifelse(year<2017, "pre", "post")
    ) %>% 
    group_by(tp, year) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="r3precip", func=zoo::rollsum, k=3) %>%
    summarize(r3max=max(r3precip, na.rm=TRUE), .groups="drop") %>%
    ggplot(aes(x=year)) + 
    geom_line(aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
    labs(x=NULL, 
         title="Maximum 3-day precipitation by year", 
         y="rolling 3-day sum of precipitation (mm)"
         ) + 
    geom_smooth(method="lm", 
                se=TRUE, 
                aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])
                ) +
    ylim(c(0, NA)) + 
    scale_color_discrete(NULL)
p6
## `geom_smooth()` using formula = 'y ~ x'

Long term daily data is downloaded for Detroit:

# Daily data download for Detroit, MI
testURLDaily <- helperOpenMeteoURL(cityName="Detroit MI", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Eastern"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=42.38&longitude=-83.1&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_dtw.json")) {
    fileDownload(fileName="testOM_daily_dtw.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_dtw.json already exists, skipping download\n")
}
## 
## File testOM_daily_dtw.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
dtwOMDaily <- formatOpenMeteoJSON("testOM_daily_dtw.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01           3                0.4               -4.4
##  2 1960-01-02 1960-01-02          55                3.2               -2.5
##  3 1960-01-03 1960-01-03          51                2.7               -2.9
##  4 1960-01-04 1960-01-04           3               -3.3               -8.4
##  5 1960-01-05 1960-01-05           3               -6                 -9.7
##  6 1960-01-06 1960-01-06           3               -0.8              -10.7
##  7 1960-01-07 1960-01-07           3                4.1               -4.9
##  8 1960-01-08 1960-01-08          51                2.5               -6.7
##  9 1960-01-09 1960-01-09          51                1.8               -7.3
## 10 1960-01-10 1960-01-10          51                4.7               -2  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     42.4     -83.1              372.             -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 42.35501
## longitude: -83.13785
## generationtime_ms: 371.5783
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: GMT-5
## elevation: 199
# Sample records of tibble
dtwOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max         <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min         <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max   <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min   <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum          <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum                   <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise                    <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset                     <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ windspeed_10m_max          <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max          <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum    <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…

Variables are converted to proper data type:

dfDailyDTW <- dtwOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyDTW)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max         <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min         <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max   <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min   <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum          <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum                   <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise                    <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset                     <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max          <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max          <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum    <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr                <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr                 <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir                <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Detroit 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Detroit 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

The apparent decrease in days with northerly winds is explored further:

dfDailyWind <- dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=300~"1. N", 
                             winddirection_10m_dominant>240~"2. E/W", 
                             winddirection_10m_dominant>=120~"3. S", 
                             winddirection_10m_dominant>60~"2. E/W", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             )
           ) %>%
    select(year, month, date, winddir, winddirection_10m_dominant, windspeed_10m_max)
dfDailyWind
## # A tibble: 23,376 × 6
##     year month date       winddir winddirection_10m_dominant windspeed_10m_max
##    <dbl> <fct> <date>     <chr>                        <int>             <dbl>
##  1  1960 Jan   1960-01-01 2. E/W                         109              15  
##  2  1960 Jan   1960-01-02 3. S                           196              21.5
##  3  1960 Jan   1960-01-03 2. E/W                         261              25.6
##  4  1960 Jan   1960-01-04 2. E/W                         247              23  
##  5  1960 Jan   1960-01-05 2. E/W                         257              28.7
##  6  1960 Jan   1960-01-06 3. S                           239              30.3
##  7  1960 Jan   1960-01-07 3. S                           223              30.9
##  8  1960 Jan   1960-01-08 2. E/W                         272              32.8
##  9  1960 Jan   1960-01-09 3. S                           161              18.4
## 10  1960 Jan   1960-01-10 2. E/W                         284              10.2
## # ℹ 23,366 more rows
dfDailyWind %>%
    count(year, winddir) %>%
    group_by(year) %>%
    mutate(pct=n/sum(n)) %>%
    ungroup() %>%
    ggplot(aes(x=year, y=pct)) + 
    geom_line(aes(group=winddir, color=winddir), lwd=1) + 
    labs(x=NULL, 
         y="Percent of Days", 
         title=paste0("Predominant Wind Direction", " (Detroit 1960-2023)"), 
         caption="N (300-060)\nE/W (061-119, 241-299)\nS (120-240)"
         ) + 
    scale_color_discrete(NULL) + 
    geom_hline(data=~summarize(group_by(., winddir), pct=mean(pct)), 
               aes(color=winddir, group=winddir, yintercept=pct), 
               lty=2
               ) +
    lims(y=c(0, NA)) +
    theme(legend.position = "bottom")

Boxplots for maximum windspeed are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Detroit 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Detroit 1960-2023)"
         )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Detroit 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Detroit 1960-2023)"
         )

dfDailyDTW %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Detroit 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyDTW$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 367 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 551 915
pacfTemp <- pacf(dfDailyDTW$temperature_2m_max, lag.max=1000)

pacf(dfDailyDTW$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyDTW$precipitation_sum, lag.max=1000)

acf(dfDailyDTW$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  30  56  77 101 118 147 179 197 213 250 266 301 327 358 371 400 428 442 454
## [20] 469 489 508 520 541 553 573 588 605 618 631 644 657 678 690 727 746 769 790
## [39] 820 847 859 879 898 918 929 943 958 974
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  18  32  60  86 114 137 162 189 209 247 261 280 296 325 340 354 386 410 426
## [20] 451 462 494 515 535 550 568 583 603 615 652 680 708 725 742 771 806 831 852
## [39] 875 907 922 934 956 977
pacfPrecip <- pacf(dfDailyDTW$precipitation_sum, lag.max=1000)

pacf(dfDailyDTW$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyDTW$windspeed_10m_max, lag.max=1000)

acf(dfDailyDTW$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1]  28  41  73 142 167 182 202 228 239 281 316 330 354 374 408 476 490 523 537
## [20] 564 620 695 717 737 834 852 873 900 952 977
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  31  87 136 154 169 191 210 253 320 356 376 404 451 483 502 518 556 575 588
## [20] 600 753 839 869 916 935 948
pacfWind <- pacf(dfDailyDTW$windspeed_10m_max, lag.max=1000)

pacf(dfDailyDTW$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)

A boxplot for delta daily temperature is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally larger in magnitude during the colder season

A boxplot for delta daily precipitation is created:

dfDailyDTW %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyDTW %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyDTW %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyDTW %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 15+ mm (above about half an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyDTW %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2020s and lightest in the 1960s, driving differences in annual precipitation

Long term daily data is downloaded for Chicago:

# Daily data download for Chicago, IL
testURLDaily <- helperOpenMeteoURL(cityName="Chicago IL", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Central"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=41.84&longitude=-87.68&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_ord.json")) {
    fileDownload(fileName="testOM_daily_ord.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_ord.json already exists, skipping download\n")
}
## 
## File testOM_daily_ord.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
ordOMDaily <- formatOpenMeteoJSON("testOM_daily_ord.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01           3                1.6               -3.5
##  2 1960-01-02 1960-01-02          51                4.9               -0.9
##  3 1960-01-03 1960-01-03           3               -0.6               -7.5
##  4 1960-01-04 1960-01-04           2               -5.9              -11.8
##  5 1960-01-05 1960-01-05           3               -6.1              -11.2
##  6 1960-01-06 1960-01-06           1                1                -10.3
##  7 1960-01-07 1960-01-07           3                4.9               -3.2
##  8 1960-01-08 1960-01-08           3                1.6               -2.8
##  9 1960-01-09 1960-01-09          51                7.1               -3.4
## 10 1960-01-10 1960-01-10          51                4.2               -1  
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     41.9     -87.6              369.             -21600 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 41.86292
## longitude: -87.64877
## generationtime_ms: 368.9336
## utc_offset_seconds: -21600
## timezone: US/Central
## timezone_abbreviation: GMT-6
## elevation: 180
# Sample records of tibble
ordOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max         <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min         <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max   <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min   <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum          <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum                   <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise                    <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset                     <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ windspeed_10m_max          <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max          <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum    <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…

Variables are converted to proper data type:

dfDailyORD <- ordOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyORD)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max         <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min         <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max   <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min   <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum          <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum                   <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum               <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours        <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise                    <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset                     <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max          <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max          <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum    <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr                 <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir                <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Chicago 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (Chicago 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (Chicago 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (Chicago 1960-2023)"
         )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (Chicago 1960-2023)"
         )

Boxplots for snow are also created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, snowfall_sum) %>%
    ggplot(aes(x=month, y=snowfall_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily snowfall (liquid equivalent mm)", 
         title="Daily snowfall by month (Chicago 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (Chicago 1960-2023)"
         )

dfDailyORD %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (Chicago 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyORD$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 366 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 180 551 915
pacfTemp <- pacf(dfDailyORD$temperature_2m_max, lag.max=1000)

pacf(dfDailyORD$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyORD$precipitation_sum, lag.max=1000)

acf(dfDailyORD$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  42  80  94 122 167 179 205 216 229 251 268 288 302 332 350 388 413 443 454
## [20] 469 492 528 570 599 612 633 654 690 730 758 770 789 808 820 845 865 881 915
## [39] 926 940 951 976
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  24  49  67  86 114 137 163 181 192 218 231 248 260 281 311 324 340 355 375
## [20] 400 430 451 462 482 521 535 549 560 583 597 616 631 662 687 713 736 757 776
## [39] 795 822 853 875 889 934 949 964 979
pacfPrecip <- pacf(dfDailyORD$precipitation_sum, lag.max=1000)

pacf(dfDailyORD$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyORD$windspeed_10m_max, lag.max=1000)

acf(dfDailyORD$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1] 149 167 182 207 228 239 353 373 408 476 523 538 554 572 594 620 722 737 852
## [20] 883 901 926 953
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1] 112 126 137 154 169 190 217 253 278 376 469 483 506 518 556 589 633 725 739
## [20] 753 847 862 879 916 939 959 976
pacfWind <- pacf(dfDailyORD$windspeed_10m_max, lag.max=1000)

pacf(dfDailyORD$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)

A boxplot for delta daily temperature is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=temperature_2m_max-lag(temperature_2m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum temperature")

Daily temperature changes are generally larger in magnitude during the colder season

A boxplot for delta daily wind speed is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=windspeed_10m_max-lag(windspeed_10m_max)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in maximum wind speed")

Daily wind speed changes are generally similar in magnitude by season

A boxplot for delta daily precipitation is created:

dfDailyORD %>%
    mutate(month=factor(month.abb[month(date)], levels=month.abb), 
           chg=precipitation_sum-lag(precipitation_sum)
           ) %>%
    filter(!is.na(chg)) %>%
    ggplot(aes(x=month, y=chg)) + 
    geom_boxplot() + 
    labs(x=NULL, y="Daily change in precipitation")

Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21 <- dfDailyORD %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21 %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="none")

Outliers are likely much more important in driving average precipitation than in driving average temperature

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- dfDailyORD %>%
    mutate(doy=pmin(yday(date), 365)) %>% 
    group_by(doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    ungroup() %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(date, doy, contains("_r21"))

df_r21_sd %>%
    na.omit() %>%
    group_by(doy) %>%
    summarize(across(where(is.numeric), .fns=mean)) %>%
    pivot_longer(cols=-c(doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=name, color=name)) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="none")

Means and standard deviations are plotted together:

df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- daily standard deviation", 
         title="Rolling 21-day mean +/- one rolling 21-day sd"
         ) + 
    theme(legend.position="none")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="src") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
    group_by(doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(doy, musig)) %>%
    pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=name, color=name), lwd=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="none")

Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.

Percentage of cumulative precipitation by volume of daily precipitation is explored:

tmpPlotData <- dfDailyORD %>%
    select(date, precipitation_sum) %>%
    group_by(precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) 

tmpPlotData %>%
    pivot_longer(cols=-c(precipitation_sum)) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(data=~filter(., name %in% c("cpp", "cpn")), 
              aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
              ) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric")

Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 20+ mm (nearly an inch) of precipitation

Total precipitation by decade is also explored, sorted by daily precipitation:

dfDailyORD %>%
    select(date, precipitation_sum) %>%
    mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
    group_by(decade, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(decade) %>%
    mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
    ungroup() %>%
    ggplot(aes(x=precipitation_sum, y=meancsp)) + 
    geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) + 
    labs(x="Daily precipitation (mm)", 
         y="Cumulative annual precipitation (mm)", 
         title="Average annual precipitation by decade", 
         subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
         ) + 
    scale_color_discrete("Decade")

Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2010s/2020s and lightest in the 1960s, driving differences in annual precipitation

Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window, for Atlanta, Detroit, and Chicago:

df_r21 <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(doy=pmin(yday(date), 365), 
           src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
           ) %>% 
    group_by(src) %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(src, date, doy, contains("_r21")) %>%
    ungroup()

df_r21 %>%
    na.omit() %>%
    group_by(src, doy) %>%
    summarize(across(where(is.numeric), .fns=mean), n=n(), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=src, color=src), lwd=1) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean") + 
    theme(legend.position="bottom") + 
    scale_color_discrete(NULL)

Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:

df_r21_sd <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(doy=pmin(yday(date), 365), 
           src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
           ) %>% 
    group_by(src, doy) %>%
    mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
    group_by(src) %>%
    helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>% 
    helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>% 
    helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
    select(src, date, doy, contains("_r21")) %>%
    ungroup()

df_r21_sd %>%
    na.omit() %>%
    group_by(src, doy) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy)) %>%
    ggplot(aes(x=doy, y=value)) + 
    geom_line(aes(group=src, color=src), lwd=1) + 
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") + 
    theme(legend.position="bottom")

Means and approximate SEM are plotted together:

nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
    bind_rows(df_r21_sd, .id="typ") %>%
    na.omit() %>%
    mutate(musig=c("1"="Mean", "2"="SD")[typ]) %>%
    group_by(src, doy, musig) %>%
    summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
    pivot_longer(cols=-c(src, doy, musig)) %>%
    pivot_wider(id_cols=c(src, doy, name), names_from="musig") %>%
    ggplot(aes(x=doy)) + 
    geom_line(aes(y=Mean, group=src, color=src), lty=2) + 
    geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=src), alpha=0.5) +
    facet_wrap(~name, scales="free_y") +
    labs(x="Day of Year", 
         y="Rolling-21 mean +/- 1 SEM (approx)", 
         title="Rolling 21-day mean +/- 1 SEM (approx)"
         ) + 
    theme(legend.position="bottom")

Percentage of cumulative precipitation by volume of daily precipitation is explored, for three cities:

tmpPlotData <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>% 
    select(src, date, precipitation_sum) %>%
    group_by(src, precipitation_sum) %>%
    summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
    group_by(src) %>%
    mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) %>%
    ungroup()

tmpPlotData %>%
    pivot_longer(cols=-c(src, precipitation_sum)) %>%
    filter(name %in% c("cpp", "cpn")) %>%
    ggplot(aes(x=precipitation_sum, y=value)) + 
    geom_line(aes(group=src, color=src)) + 
    labs(x="Daily precipitation (mm)", y="% total (cumul)") + 
    scale_color_discrete("Metric") + 
    facet_wrap(~c("cpn"="# events", "cpp"="precip")[name])

Around half of days have no precipitation. In all three cities, around 25% of total precipitation comes from the rare day with 15-20+ mm (half inch to nearly an inch) of precipitation

Frequency of precipitation by change in temperature is explored:

bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
    select(src, date, precipitation_sum, temperature_2m_max) %>%
    group_by(src) %>%
    mutate(isPrecip=precipitation_sum>0, 
           dTemp=temperature_2m_max-lag(temperature_2m_max), 
           month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    ungroup() %>%
    filter(complete.cases(.)) %>%
    mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
    group_by(src, dTempRound) %>%
    summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
    mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
    ggplot(aes(x=dTempRound)) + 
    geom_line(aes(y=pctPrecip, color=src, group=src)) + 
    geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
    labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)", 
         y="% Days with Precipitation > 0\n(mean +/- 1 SE)", 
         title="Frequency of precipitation by change in temperature"
         ) +
    lims(y=c(0, 1)) + 
    scale_fill_discrete(NULL) + 
    scale_color_discrete(guide="none")

Month is a possible confounder, since temperature change and precipitation both vary by season:

bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
    mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
    select(src, date, precipitation_sum, temperature_2m_max) %>%
    group_by(src) %>%
    mutate(isPrecip=precipitation_sum>0, 
           dTemp=temperature_2m_max-lag(temperature_2m_max), 
           month=factor(month.abb[month(date)], levels=month.abb)
           ) %>%
    ungroup() %>%
    filter(complete.cases(.)) %>%
    mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
    group_by(src, month, dTempRound) %>%
    summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
    mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
    filter(n>=5) %>%
    ggplot(aes(x=dTempRound)) + 
    geom_line(aes(y=pctPrecip, color=src, group=src)) + 
    geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
    facet_wrap(~month) +
    labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)", 
         y="% Days with Precipitation > 0\n(mean +/- 1 SE)", 
         title="Frequency of precipitation by change in temperature"
         ) +
    lims(y=c(0, 1)) + 
    scale_fill_discrete(NULL) + 
    scale_color_discrete(guide="none")

At a glance, there appears to be some association between temperature change and frequency of precipitation, even after controlling for city and month

Long term daily data is downloaded for New Orleans:

# Daily data download for New Orleans, LA
testURLDaily <- helperOpenMeteoURL(cityName="New Orleans LA", 
                                   dailyIndices=1:nrow(tblMetricsDaily),
                                   startDate="1960-01-01", 
                                   endDate="2023-12-31", 
                                   tz="US/Central"
                                   )
## 
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=30.07&longitude=-89.93&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_msy.json")) {
    fileDownload(fileName="testOM_daily_msy.json", url=testURLDaily)
} else {
    cat("\nFile testOM_daily_msy.json already exists, skipping download\n")
}
## 
## File testOM_daily_msy.json already exists, skipping download

The daily dataset is loaded:

# Read daily JSON file
msyOMDaily <- formatOpenMeteoJSON("testOM_daily_msy.json")
## 
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily 
## 
## $tblDaily
## # A tibble: 23,376 × 18
##    date       time       weathercode temperature_2m_max temperature_2m_min
##    <date>     <chr>            <int>              <dbl>              <dbl>
##  1 1960-01-01 1960-01-01          61               15.5               10.9
##  2 1960-01-02 1960-01-02          63               19.1               12.3
##  3 1960-01-03 1960-01-03          51               17.2                8.6
##  4 1960-01-04 1960-01-04           3               12.1                6.4
##  5 1960-01-05 1960-01-05          51               17.9               11.3
##  6 1960-01-06 1960-01-06          63               18.2               10.3
##  7 1960-01-07 1960-01-07          51               10.3                6.1
##  8 1960-01-08 1960-01-08           3               11.7                4.6
##  9 1960-01-09 1960-01-09           3               15.8                7.5
## 10 1960-01-10 1960-01-10          51               19.1               12.8
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## #   apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## #   snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## #   windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## #   winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## #   et0_fao_evapotranspiration <dbl>
## 
## $tblHourly
## NULL
## 
## $tblUnits
## # A tibble: 17 × 4
##    metricType  name                       value      description                
##    <chr>       <chr>                      <chr>      <chr>                      
##  1 daily_units time                       "iso8601"  <NA>                       
##  2 daily_units weathercode                "wmo code" The most severe weather co…
##  3 daily_units temperature_2m_max         "deg C"    Maximum and minimum daily …
##  4 daily_units temperature_2m_min         "deg C"    Maximum and minimum daily …
##  5 daily_units apparent_temperature_max   "deg C"    Maximum and minimum daily …
##  6 daily_units apparent_temperature_min   "deg C"    Maximum and minimum daily …
##  7 daily_units precipitation_sum          "mm"       Sum of daily precipitation…
##  8 daily_units rain_sum                   "mm"       Sum of daily rain          
##  9 daily_units snowfall_sum               "cm"       Sum of daily snowfall      
## 10 daily_units precipitation_hours        "h"        The number of hours with r…
## 11 daily_units sunrise                    "iso8601"  Sun rise and set times     
## 12 daily_units sunset                     "iso8601"  Sun rise and set times     
## 13 daily_units windspeed_10m_max          "km/h"     Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max          "km/h"     Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg "     Dominant wind direction    
## 16 daily_units shortwave_radiation_sum    "MJ/m²"    The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm"       Daily sum of ET0 Reference…
## 
## $tblDescription
## # A tibble: 1 × 7
##   latitude longitude generationtime_ms utc_offset_seconds timezone  
##      <dbl>     <dbl>             <dbl>              <int> <chr>     
## 1     30.1     -89.9             7165.             -18000 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
## 
## 
## latitude: 30.05272
## longitude: -89.89499
## generationtime_ms: 7164.689
## utc_offset_seconds: -18000
## timezone: US/Central
## timezone_abbreviation: GMT-5
## elevation: 0
# Sample records of tibble
msyOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <int> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max         <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min         <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max   <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min   <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum          <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum                   <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours        <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise                    <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset                     <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ windspeed_10m_max          <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max          <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum    <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…

Variables are converted to proper data type:

dfDailyMSY <- msyOMDaily$tblDaily %>%
    mutate(weathercode=factor(weathercode), 
           sunrise_chr=sunrise, 
           sunset_chr=sunset,
           sunrise=lubridate::ymd_hm(sunrise), 
           sunset=lubridate::ymd_hm(sunset), 
           fct_winddir=factor(winddirection_10m_dominant)
           )
glimpse(dfDailyMSY)
## Rows: 23,376
## Columns: 21
## $ date                       <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time                       <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode                <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max         <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min         <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max   <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min   <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum          <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum                   <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours        <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise                    <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset                     <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max          <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max          <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum    <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr                <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr                 <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir                <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…

Averages by month for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year, month) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    group_by(month) %>%
    summarize(across(-c(year), .fns=mean)) %>%
    pivot_longer(-c(month)) %>%
    ggplot(aes(x=month, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Monthly averages for key metrics (New Orleans 1960-2023)")

Averages by month for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(month, wc, winddir) %>%
    pivot_longer(-c(month)) %>%
    count(month, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=month, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=2) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Averages by year for select continuous variables are plotted:

tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)", 
                 "windspeed_10m_max"="4. Windspeed (kph)", 
                 "temperature_2m_max"="1. High Temperature (C)",
                 "temperature_2m_min"="2. Low Temperature (C)"
                 )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
    group_by(year) %>%
    summarize(across(-c(precipitation_sum), .fns=mean), 
              across(c(precipitation_sum), .fns=sum), 
              .groups="drop"
              ) %>%
    pivot_longer(-c(year)) %>%
    ggplot(aes(x=year, y=value)) + 
    geom_line(aes(group=1)) + 
    facet_wrap(~tmpMapNames[name], scales="free_y") + 
    labs(x=NULL, y=NULL, title="Annual averages for key metrics (New Orleans 1960-2023)")

Averages by year for select categorical variables are plotted:

tmpMapNames <- c("wc"="1. Weather Type", 
                 "winddir"="2. Predominant Wind Direction"
                 )

tmpDFPlot <- dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), 
           year=year(date), 
           winddir=case_when(winddirection_10m_dominant>360~"Invalid", 
                             winddirection_10m_dominant>=315~"1. N", 
                             winddirection_10m_dominant>=225~"2. W", 
                             winddirection_10m_dominant>=135~"3. S", 
                             winddirection_10m_dominant>=45~"4. E", 
                             winddirection_10m_dominant>=0~"1. N", 
                             TRUE~"Invalid"
                             ), 
           wc=case_when(weathercode==0~"1. Clear", 
                        weathercode %in% c(1, 2, 3)~"2. Dry", 
                        weathercode %in% c(51, 53, 55)~"3. Drizzle", 
                        weathercode %in% c(61, 63, 65)~"4. Rain", 
                        weathercode %in% c(71, 73, 75)~"5. Snow", 
                        TRUE~"Error"
                        )
           ) %>%
    select(year, wc, winddir) %>%
    pivot_longer(-c(year)) %>%
    count(year, name, value)

tmpPlotFN <- function(x) {
    p1 <- tmpDFPlot %>%
        filter(name==x) %>%
        ggplot(aes(x=year, y=n)) + 
        geom_line(aes(group=value, color=value), lwd=1) + 
        labs(x=NULL, 
             y=NULL, 
             title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
             ) + 
        scale_color_discrete(NULL) + 
        theme(legend.position = "bottom")
    return(p1)
    }

gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)

Boxplots for maximum windspeed are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, windspeed_10m_max) %>%
    ggplot(aes(x=month, y=windspeed_10m_max)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Maximum daily wind speed (kph)", 
         title="Maximum daily windspeed by month (New Orleans 1960-2023)"
         )

Boxplots for precipitation are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_hours) %>%
    ggplot(aes(x=month, y=precipitation_hours)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation hours", 
         title="Daily precipitation hours by month (New Orleans 1960-2023)"
         )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, precipitation_sum) %>%
    ggplot(aes(x=month, y=precipitation_sum)) + 
    geom_boxplot(fill="lightblue") + 
    lims(y=c(0, NA)) +
    labs(x=NULL, 
         y="Daily precipitation (mm)", 
         title="Daily precipitation by month (New Orleans 1960-2023)"
         )

Boxplots for temperature are created:

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_max) %>%
    ggplot(aes(x=month, y=temperature_2m_max)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily high temperature (C)", 
         title="Daily high temperature by month (New Orleans 1960-2023)"
         )

dfDailyMSY %>%
    mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
    select(month, temperature_2m_min) %>%
    ggplot(aes(x=month, y=temperature_2m_min)) + 
    geom_boxplot(fill="lightblue") + 
    labs(x=NULL, 
         y="Daily low temperature (C)", 
         title="Daily low temperature by month (New Orleans 1960-2023)"
         )

ACF and PACF are explored for maximum temperature:

acfTemp <- acf(dfDailyMSY$temperature_2m_max, lag.max=1000)

as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 371 736
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 546 917
pacfTemp <- pacf(dfDailyMSY$temperature_2m_max, lag.max=1000)

pacf(dfDailyMSY$temperature_2m_max, lag.max=50)

As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year

ACF and PACF are explored for precipitation:

acfPrecip <- acf(dfDailyMSY$precipitation_sum, lag.max=1000)

acf(dfDailyMSY$precipitation_sum, lag.max=50)

as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
##  [1]  18  70  99 113 127 139 155 179 203 227 241 264 327 362 382 403 433 452 465
## [20] 486 530 544 565 577 628 641 670 690 703 721 737 761 792 817 842 854 880 892
## [39] 909 937 967 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1]  22  50  64  82 104 131 150 164 176 196 223 247 261 290 307 329 345 376 388
## [20] 412 428 439 456 474 492 526 552 568 580 592 625 636 659 680 700 725 741 757
## [39] 775 797 809 830 846 872 885 906 918 944 971
pacfPrecip <- pacf(dfDailyMSY$precipitation_sum, lag.max=1000)

pacf(dfDailyMSY$precipitation_sum, lag.max=50)

Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value

ACF and PACF are explored for wind speed:

acfWind <- acf(dfDailyMSY$windspeed_10m_max, lag.max=1000)

acf(dfDailyMSY$windspeed_10m_max, lag.max=50)

as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
##  [1] 170 207 227 333 353 542 579 713 725 741 767 893 920 938
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
##  [1] 164 191 365 532 549 583 719 734 869 911 922
pacfWind <- pacf(dfDailyMSY$windspeed_10m_max, lag.max=1000)

pacf(dfDailyMSY$windspeed_10m_max, lag.max=50)

Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.8 for temperature)